sugarglider: Create glyphmaps of spatio-temporal data

Spatiotemporal data visualization is crucial for identifying complex trends and anomalies over time and space. Traditional methods often lack the detail needed for in-depth analysis. The sugarglider package, an extension of ggplot2, enhances this visualization by enabling detailed exploration of temporal dynamics across various domains like climate and transportation. This paper showcases sugarglider’s ability to effectively interpret diverse datasets with its customizable geometries and interactive glyph maps, empowering researchers to uncover temporal and seasonal patterns in spatiotemporal data.

Maliny Po (Monash University) , Nathan Shuyuan Yang (Duke University) , H. Sherry Zhang (University of Texas at Austin) , Dianne Cook (Monash University)
2024-10-18

1 Introduction

Spatiotemporal data visualization enables users to explore events and interactions across both space and time, facilitating the discovery of patterns, anomalies, and relationships within the data. As Kim et al. (2017) explain, a natural approach to analyzing spatiotemporal data is to plot it on a map and then incorporate animation controls or small multiple views to visualize each time step. This method allows analysts to gain insights into both the spatial distribution of the data and the patterns or correlations that emerge over time within these distributions.

One method for visualizing spatiotemporal data is the glyph map. Glyph maps are specialized visual tools that condense multivariate data into a single graphical element, or glyph, placed at a specific location on the map. Glyph are a combinaton of two main structural components: spatial coordinates and data value. This type of visualisation allows for simultaneous visualization of local patterns (specific to individual locations) and global patterns (across all locations). One variation of glyph map is Line Glyph. One of the limitation for this type of the plot is that it can only displays changes overtime and can’t effectively capture different component of the data.

To address this limitation, the sugarglider was developed, extending ggplot2 by introducing glyph map visualizations tailored for spatiotemporal data. The package offers two core functions geom_glyph_ribbon() and geom_glyph_segment() which enable users to create ribbon and segment glyph maps. These functions are designed not only to visualize changes over time but also to capture variations in the data. geom_glyph_ribbon() represents intervals between two values, making it ideal for showing ranges or uncertainties, while geom_glyph_segment() connects data points to reveal trends and transitions.

The sugarglider package can be used to visualize seasonal patterns, temporal variations, or any time-dependent measurements at specific locations. Its customizable features, such as adjustable glyph dimensions, global or individual scaling, and the ability to add reference lines or boxes, allow users to tailor their visualizations to meet their specific data analysis needs. This versatility makes sugarglider a valuable tool for researchers and analysts working with spatiotemporal datasets across a variety of fields, including transportation, environmental monitoring, and urban planning.

The following sections will explore how sugarglider enables users to effectively visualize spatiotemporal data, providing practical examples and highlighting the package’s flexibility in creating insightful and customizable glyph maps.

2 Literature Review

2.1 Visualizing spatio-temporal Data

Data visualization is often viewed as a process of translating unstructured raw data into structured representations that facilitate interpretation. Spence (2001) describes visualization as the formation of a mental model or image, which aligns with definitions in the Oxford Dictionary that frame a model as either an example to follow or a simplified mathematical description aiding in calculations and predictions. The general workflow for realizing data visualization consists of four fundamental stages: data collection and storage, preprocessing to transform data into comprehensible information, display rendering via hardware and algorithms, and finally, interpretation through the human perceptual system (Waddell (2004)).

Spatiotemporal visualizations offer a particular advantage by providing a global view of activities, enabling the detection of trends and evolution, which makes them indispensable tools for spatial and temporal analysis, as well as for decision-making (Zhong et al. (2012)). Various techniques are used in spatiotemporal data visualization, such as multiple maps representing different time moments, dynamic maps with user control, change maps highlighting differences between two time points, and the space-time cube, a 3D visualization integrating geographical space and time (andrienko2003exploratory).

2.2 Glyph-maps

Glyph maps are specialized forms of multivariate glyph plots that represent each spatial location with a glyph, summarizing data collected over time at that location. As detailed in the paper by Wickham et al. (2012), glyph maps are a refined type of multivariate glyph plots in which each spatial location is depicted by a single glyph that encapsulates data collected over time at that point. These maps are instrumental in uncovering both local and global patterns, particularly emphasizing temporal relationships.

The construction of a glyph map evolves around a transformation technique that utilizes principles of linear algebra to convert temporal coordinates (minor coordinates) into spatial coordinates (major coordinates). This transformation is implemented in packages such as GGally by Schloerke et al. (2024) and cubble by Zhang et al. (2024), facilitating a more integrated approach to spatio-temporal data visualization.

In order for a plot to be useful for a specific data analysis task, it should make primary comparisons easy to see. Mapping values to position is one of the easiest properties to perceive (Cleveland and McGill (1984)). Glyph-maps facilitate the display of changes in slope, trend, average value, and variance over time. They organize the graphical elements necessary for temporal comparisons. However, glyph-maps, especially when representing linear trends as icons, may be subject to the Zöllner Illusion, which distorts straight lines. Therefore, accurate perception of change requires the implementation of reference frames and scaling (Wickham et al. (2012))

2.3 Extending ggplot2 with ggproto

The architecture of ggplot2 is based on the ggproto system of object-oriented programming. nitially, ggplot2 relied on the proto system for object-oriented tasks, an S3 subclass of the R environment class. Proto objects are manipulated using the proto function, which sets the parent environment, evaluates expressions, and manages lazy evaluation of arguments (Grothendieck et al. (2016)).

As the demand for an official extension mechanism in ggplot2 increased, the limitations of the proto system became evident, prompting the adoption of ggproto. This transition is documented in Wickham (2016), which also introduces how to utilize ggproto objects to extend ggplot2 functionalities.

The creation of a new ggproto object is facilitated by the ggproto() function, which requires the name of the new class and an existing ggproto object from which it will inherit. For instance, to introduce a new statistical transformation, one might create a ggproto that inherits from Stat and Geom. However, merely creating a ggproto object does not make it accessible or useful to the end user.

To bridge this gap, the creation of a layer function is necessary. An example is the new_stat() function, which follows a consistent format: setting defaults in the function arguments, and calling layer(), which handles the distribution of these arguments into either geom parameters, stat parameters, or aesthetics.

While developing ggplot2 extensions, it may seem intuitive to encapsulate extensions as new geoms, as they are frequently used by users to add layers to a plot. However, the diversity in ggplot2’s capabilities often stems more from the variety in statistical transformations (stats) than merely geometric objects (geoms), suggesting a nuanced approach in designing extensions that effectively enhance the plotting system.

3 Software

The sugarglider package extends the capabilities of ggplot2 by introducing functions specifically designed for visualizing seasonal patterns in spatio-temporal data. It includes geom_glyph_ribbon() and geom_glyph_segment() , which represent measurements recorded over time at specific locations through the use of glyph maps. These functions enable clear depictions of seasonal trends by leveraging the combination of x_major and y_major coordinates.

The structure of glyph maps in sugarglider consists of four main layers: the base map, glyph boxes, reference lines, and ribbon or segment glyphs. Additionally, users can create a legend, adding an extra layer to the glyph maps. Apart from the base map, sugarglider offers functionalities to generate all the elements of a comprehensive glyph map, as illustrated in the figure below.

The diagram depicts the structure of a glyph-map. The initial layer represents the base map. Subsequent layers comprise of glyph boxes and reference lines. The fourth layer encompasses the glyph itself, offering users the choice to depict ribbon glyphs. The legend layer is optional. It enabling users to display a legend—a magnified version of one of the glyphs.

Figure 1: The diagram depicts the structure of a glyph-map. The initial layer represents the base map. Subsequent layers comprise of glyph boxes and reference lines. The fourth layer encompasses the glyph itself, offering users the choice to depict ribbon glyphs. The legend layer is optional. It enabling users to display a legend—a magnified version of one of the glyphs.

Each layer can be plotted independently, and the package supports the creation of glyph plots using either ribbon or segment geometries. The core functionality includes:

In addition to these two functions, sugarglider offers several other features that enhance the customization of glyph maps. The add_ref_box() function introduces reference boxes that visually frame individual glyphs, helping to define boundaries and distinguish glyphs from each other. The add_ref_line() function draws a horizontal midpoint for each glyph, facilitating comparisons across data points. The add_glyph_legend() function allows users to display an enlarged version of a randomly chosen glyph in the bottom-left corner of the panel, enabling users to visualize the data range. Lastly, the theme_glyph() function provides a customized theme for glyph maps, built on top of theme_map() from ggthemes. It adjusts the plot’s appearance, including the legend position, text styles, and background settings, to create a clean, visually consistent layout for glyph visualizations.

# Ribbon glyph
vic_temp |>
   ggplot(aes(x_major = long,
              y_major = lat,
              x_minor = month,
              ymin_minor = tmin,
              ymax_minor = tmax)) +
  add_glyph_boxes() +
  add_ref_lines() +
  geom_glyph_ribbon() +
  theme_glyph()

# Segment glyph
vic_temp |>
   ggplot(aes(x_major = long,
              y_major = lat,
              x_minor = month,
              y_minor = tmin,
              yend_minor = tmax)) +
  add_glyph_boxes() +
  add_ref_lines() +
  geom_glyph_segment() +
  theme_glyph()
A comparison between ribbon and segment glyph-maps: Glyph boxes and reference lines have been added to frame each glyph and introduce a line that divides each glyph midway, helping users make inferences about the plot. Additional coding is necessary to create the base map and adjust the width and height of each glyph.

Figure 2: A comparison between ribbon and segment glyph-maps: Glyph boxes and reference lines have been added to frame each glyph and introduce a line that divides each glyph midway, helping users make inferences about the plot. Additional coding is necessary to create the base map and adjust the width and height of each glyph.

The sugarglider package provides various customization options for more flexible visualization. It includes features such as the global_rescale argument, which allows users to choose between global or individual glyph scaling. Users can also adjust the scaling of minor values within grid cells, as well as the overall width and height of glyphs. This ensures that the glyph-map can be finely tuned to meet specific data representation needs. The following section will explore these features in greater detail and provide practical examples that illustrate their application within different visualization contexts.

3.1 Aesthetics

sugarglider provides the same aesthetics for geom_glyph_ribbon() and geom_glyph_segment() as those available in geom_ribbon() and geom_segment() from ggplot2. To include a variable in the glyph plot, it must be specified as an aesthetic. The functions in sugarglider expect spatial coordinates as the major axis and temporal data, along with some measurements, as the minor axis.

To produce glyph-maps, the following aesthetics are required:

Aesthetics Description
x_major,y_major Spatial coordinates that define the position of glyphs.
x_minor Represents temporal data associated with each glyph.
ymin_minor, ymax_minor Used by geom_glyph_ribbon() to establish the lower and upper bounds of the ribbon geometry within each glyph.
y_minor, yend_minor Used by geom_glyph_segment() to set the start and end points of the segment geometry within each glyph.

The functions add_ref_box(), add_ref_line(), and add_geom_legend() are compatible with either ymin_minor, ymax_minor, or y_minor, yend_minor. Additionally, sugarglider introduces several customizable options to further tailor the visual aspects:

Option Default Description
colour "black" Sets the color for line segments and borders.
linewidth 0.5 Specifies the width of the line for borders.
linetype 1 Defines the style of the line for borders.
fill "black" Determines the color of the interior area of the geometries.
alpha 0.8 Controls the transparency level of the glyphs.

3.2 Options

Options within the sugarglider package allow you to tailor the behavior of your visualizations to meet the specific needs of your analysis. The global_rescale argument provides control over whether rescaling should occur globally across all data points or be handled individually for each glyph.

sugarglider also offers a variety of customizable features to enhance the flexibility and precision of visualizations. For example, it facilitates the scaling of minor values within the glyph along both the x and y axes. Users can specify their own rescale function by replacing “identity” with a custom function in x_scale and y_scale. If a user wishes to modify the rescaling function on only one axis, they can replace the value of the corresponding parameter with their chosen function and retain “identity” for the other. In this package, “identity” rescales the minor axes to an interval of [-1,1]. The impact of rescaling on glyphs and its implications for visual interpretation will be thoroughly discussed in the upcoming section.

Additionally, the width and height of the glyphs are adjustable, allowing users to modify the appearance of each glyph to match the dimensions and scaling of the data being visualized. These customization options ensure that sugarglider can adapt to a broad range of data types and requirements, making it a versatile tool for seasonal spatiotemporal data visualization.

Option Default Description
x_scale "identity" This function scales each set of minor values within a grid cell along the x-dimension.
y_scale "identity" This function scales each set of minor values within a grid cell along the y-dimension.
width ggplot2::rel(4) Width of the glyph.
height ggplot2::rel(2.5) Height of the glyph.
global_rescale TRUE Determines whether rescaling is applied globally across all glyphs or individually for each glyph

3.3 Data structure

When using the sugarglider package to create glyph plots, the first step is to ensure that your data is in the correct format. There are two data structures to consider as per Zhang et al. (2024), and one of them is compatible with sugarglider. The package supports data structured in a long format that includes both temporal and spatial elements.

sugarglider includes an dataset called aus_temp, which is sourced from The National Oceanic and Atmospheric Administration (NOAA). This dataset provides a comprehensive set of climate data from 29 stations across Australia for the year 2020. It includes important climate variables such as precipitation and temperature, as well as key spatial elements (longitude and latitude) and temporal elements (month), along with temperature ranges. These temperature ranges are crucial for determining the widths of the ribbon and segment plots in glyph-maps.

glimpse(aus_temp)
Rows: 348
Columns: 7
$ id    <chr> "ASN00001020", "ASN00001020", "ASN00001020", "ASN00001…
$ long  <dbl> 126.3867, 126.3867, 126.3867, 126.3867, 126.3867, 126.…
$ lat   <dbl> -14.0900, -14.0900, -14.0900, -14.0900, -14.0900, -14.…
$ month <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 1, 2, 3, 4, 5, …
$ tmin  <dbl> 253.4516, 248.6786, 253.6129, 244.0357, 220.4138, 202.…
$ tmax  <dbl> 319.0000, 322.6071, 333.1935, 340.9310, 331.9333, 310.…
$ prcp  <dbl> 163.87096774, 162.74074074, 42.00000000, 21.57142857, …

Datasets may not always include both spatial and temporal elements. Analysts often start with station data that provides geographic locations, recorded variables, and the time periods of these variables. To extract relevant data, they can query the temporal variables for specific stations of interest. In some cases, analysts may start with purely spatial or purely temporal data, which then requires additional elements to transform it into a spatio-temporal format.

For these situations, the cubble package offers functions such as make_cubble() to help users structure their data into cubble objects, which are optimized for use with glyph-maps. This structuring facilitates the creation of detailed and insightful spatio-temporal visualizations, enabling the data to be seamlessly integrated into the sugarglider package.

3.4 Rescale

In sugarglider, rescaling is a crucial preprocessing step applied to the minor axes, which are the data used to plot individual glyphs. This rescaling prepares the data for a linear transformation that maps temporal data onto a spatial representation. This important process will be explored in greater detail in the subsequent section. The rescaling mechanism is governed by two parameters: x_scale and y_scale. The x_scale parameter adjusts the minor values along the x-dimension within each glyph, while y_scale modifies them along the y-dimension.

By default, the rescaling function is set to “identity”, which adjusts the minor axes to fit within the interval [-1, 1]. However, users can customize the rescaling function by replacing the default settings for x_scale and y_scale with their own functions. For example, the following code demonstrates a custom rescale function that transforms values to fit within the interval [0, 1]. When this custom rescale is applied, the resulting ribbon in the plot appears significantly thinner compared to the previous example, which used the default rescaling settings

# Default rescale 
nsw_temp |>
   ggplot(aes(x_major = long,
              y_major = lat,
              x_minor = month,
              ymin_minor = tmin,
              ymax_minor = tmax)) +
  geom_glyph_ribbon() +
  theme_glyph() 

# Custom rescale 
custom_rescale <- function(dx) {
  rng <- range(dx, na.rm = TRUE)
  # Rescale dx to [0,1]
  rescaled <- (dx - rng[1]) / (rng[2] - rng[1])
}

nsw_temp |>
   ggplot(aes(x_major = long,
              y_major = lat,
              x_minor = month,
              ymin_minor = tmin,
              ymax_minor = tmax)) +
  geom_glyph_ribbon(x_scale = custom_rescale,
                    y_scale = custom_rescale) +
  theme_glyph() 
The figure illustrates the effect of rescaling on ribbon glyphs. With the default rescaling, all minor axes are adjusted to fit within the interval [-1, 1], whereas the custom rescale function adjusts the minor axes to the interval [0, 1]. Additional code is required to plot the base map alongside the rescaled glyphs.

Figure 3: The figure illustrates the effect of rescaling on ribbon glyphs. With the default rescaling, all minor axes are adjusted to fit within the interval [-1, 1], whereas the custom rescale function adjusts the minor axes to the interval [0, 1]. Additional code is required to plot the base map alongside the rescaled glyphs.

To fully grasp the impact of rescaling on the mapping of temporal data to glyphs, it’s important to consider how this process applies to both geom_glyph_ribbon() and geom_glyph_segment(). The transformation of spatio-temporal data into visual representations will be explored in greater detail in the next section.

Additionally, sugarglider gives users the flexibility to choose whether rescaling is applied globally across all glyphs or individually for each glyph. This behavior is controlled by the global_rescale parameter, which defaults to TRUE. When global_rescale is set to FALSE, users can implement local rescaling, allowing each glyph to be scaled independently. The difference between global and local rescaling is evident in the following example:

# Global rescale
aus_temp |>
  ggplot(aes(
    x_major = long, 
    y_major = lat, 
    x_minor = month, 
    y_minor = tmin, 
    yend_minor = tmax)) +
  add_glyph_boxes() +
  add_ref_lines() +
  geom_glyph_segment(global_rescale = TRUE) +
  theme_glyph()

# Local Rescale
aus_temp |>
  ggplot(aes(
    x_major = long, 
    y_major = lat, 
    x_minor = month, 
    y_minor = tmin, 
    yend_minor = tmax)) +
  add_glyph_boxes() +
  add_ref_lines() +
  geom_glyph_segment(global_rescale = FALSE) +
  theme_glyph()
The figure highlights the impact of global and local rescaling on segment glyphs. With global rescaling, the temperature range is uniform across all glyphs, allowing users to compare variation in temperature between stations across Australia. In contrast, with local rescaling, the temperature range varies between glyphs, enabling more detailed insights into the temperature distribution at individual stations. Additional code is required to plot the base map and specify the xlim for the sf coordinates.

Figure 4: The figure highlights the impact of global and local rescaling on segment glyphs. With global rescaling, the temperature range is uniform across all glyphs, allowing users to compare variation in temperature between stations across Australia. In contrast, with local rescaling, the temperature range varies between glyphs, enabling more detailed insights into the temperature distribution at individual stations. Additional code is required to plot the base map and specify the xlim for the sf coordinates.

The rescaling process in sugarglider involves several steps. First, the function checks for any custom scaling based on the x_scale and y_scale parameters. It then groups the data based on the designated grouping variable to ensure that each glyph is drawn as a distinct path. If global_rescale is set to TRUE, the data is ungrouped before rescaling the minor axes, ensuring consistent scaling across all glyphs. Conversely, if global_rescale is set to FALSE, the data remains grouped, resulting in local scaling for each individual glyph.

For both geom_glyph_ribbon() and geom_glyph_segment(), rescaling is applied separately to the ymin_minor and ymax_minor values, or to the `y_minor and yend_minor values, respectively. This ensures that both the lower and upper bounds are adjusted independently to fit within the specified scale, maintaining accuracy and clarity in the visualization.

3.5 Spatial-temporal transformation

Diagram highlights how spatial data (geographical location) is combined with temporal data (measurements over time) to create a spatio-temporal visualization. In sugarglider, the transformation maps each station's temporal measurements into a visual glyph, allowing users to see patterns over time across different spatial locations.

Figure 5: Diagram highlights how spatial data (geographical location) is combined with temporal data (measurements over time) to create a spatio-temporal visualization. In sugarglider, the transformation maps each station’s temporal measurements into a visual glyph, allowing users to see patterns over time across different spatial locations.

The construction of a glyph map, as described in Wickham et al. (2012), involves a linear combination of two key structural components: spatial location and data values. In this context, the major axes represent the spatial coordinates, latitude (\(y_{major}\)) and longitude (\(x_{major}\)), while the minor axes correspond to time (\(x_{minor}\)) and a measurement of interest (\(ymax_{minor}\) and \(ymin_{minor}\)). For segment glyphs, the measurement is represented by \(y_{minor}\) and \(yend_{minor}\).

Once the minor axes are rescaled to the interval [-1, 1], the final coordinates for the ribbon glyph are determined through a linear transformation, as follows:

\[\begin{align} \text{x} &= \text{x}_{\text{major}} + \frac{\text{width}}{2} \cdot \text{x}_{\text{minor}} \\ \text{ymin} &= \text{y}_{\text{major}} + \frac{\text{height}}{2} \cdot \text{ymin}_{\text{minor}} \\ \text{ymax} &= \text{y}_{\text{major}} + \frac{\text{height}}{2} \cdot \text{ymax}_{\text{minor}} \end{align}\]

Similarly, the coordinates for the segment glyph are computed as:

\[\begin{align} \text{x} &= \text{x}_{\text{major}} + \frac{\text{width}}{2} \cdot \text{x}_{\text{minor}} \\ \text{y} &= \text{y}_{\text{major}} + \frac{\text{height}}{2} \cdot \text{y}_{\text{minor}} \\ \text{yend} &= \text{y}_{\text{major}} + \frac{\text{height}}{2} \cdot \text{yend}_{\text{minor}} \end{align}\]

This linear transformation ensures that the temporal and data components are properly aligned with the spatial coordinates, enabling a clear and accurate visualization of spatio-temporal data.

3.6 Examples

The aus_temp dataset is used to demonstrate the functionality of the sugarglider package. Using the default rescaling parameters, we can visualize temperature data with geom_glyph_segment(), alongside geom_point() that mark the location of each weather station. Each segment glyph represents local climate data, providing an intuitive way to explore temperature variations across Australia.

The default identity scaling function is applied to each set of minor values within each glyph. This method centers the glyphs both vertically and horizontally based on the station’s coordinates and adjusts the minor axes to fit within the interval [-1, 1]. This ensures that the glyphs are appropriately scaled and sized to fit within the defined dimensions of the glyph.

aus_temp |>
  ggplot(aes(
    x_major = long, 
    y_major = lat, 
    x_minor = month, 
    y_minor = tmin, 
    yend_minor = tmax)) +
  add_glyph_boxes() +
  geom_point(aes(x = long, y = lat,
                 color = "Weather Station")) +
  geom_glyph_segment() +
  theme_glyph()
Daily temperature variations across Australian weather stations are visualized using Geom points to display the weather station locations. Additional codes are required for the base map.

Figure 6: Daily temperature variations across Australian weather stations are visualized using Geom points to display the weather station locations. Additional codes are required for the base map.

The previous segment glyph map used global rescaling (enabled by default), meaning the glyphs were sized relative to one another based on their data values. By disabling global rescaling, we can observe the effects of local rescaling, where each glyph is resized according to its individual values.

Building on the analysis, precipitation data across Australia can be visualized using geom_glyph_ribbon(). The glyphs are color-coded to represent different levels of rainfall, while reference lines and glyph boxes are added to improve clarity and facilitate easy comparisons of precipitation levels across the country.

aus_temp |>
   group_by(id) |>
   mutate(prcp = mean(prcp, na.rm = TRUE)) |>
   ggplot(aes(x_major = long, y_major = lat,
              x_minor = month, ymin_minor = tmin,
              ymax_minor = tmax, 
              fill = prcp, color = prcp)) +
   add_glyph_boxes() +
   add_ref_lines() +
   geom_glyph_ribbon() +
   theme_glyph()
Precipitation and Temperature Ranges Across Australia. Each glyph is color coded based on the level of precipitation, with sky blue representing the lowest and navy blue representing the highest. Additional codes are needed for the base map and additional theme customization.

Figure 7: Precipitation and Temperature Ranges Across Australia. Each glyph is color coded based on the level of precipitation, with sky blue representing the lowest and navy blue representing the highest. Additional codes are needed for the base map and additional theme customization.

To compare temperature trends across different years for specific regions in Victoria, geom_glyph_ribbon() offers an effective way to visualize how temperatures have evolved over time. Each year is distinguished by a different color, making the trends clear and easy to interpret.

historical_temp |> 
  filter(id %in% c("ASN00026021", "ASN00085291", "ASN00084143")) |>
   ggplot(aes(color = factor(year), fill = factor(year),
              group = interaction(year,id),
              x_major = long, y_major = lat,
              x_minor = month, ymin_minor = tmin, 
              ymax_minor = tmax)) +
   add_glyph_boxes() +
   add_ref_lines() +
   geom_glyph_ribbon() +
   theme_glyph()
Temperature Trends in Selected Victorian Weather Stations. Additional codes are required for the base map, title, and further theme customization.

Figure 8: Temperature Trends in Selected Victorian Weather Stations. Additional codes are required for the base map, title, and further theme customization.

To further improve map readability, the add_geom_legend() function integrates an enlarged version of one of the glyphs in the bottom left corner of the plot. This legend helps users interpret the data scale more effectively.

In the example below, ribbon glyph-map is created using geom_glyph_ribbon() and overlaid on a basemap to depict daily temperature variations across Australian weather stations. A legend is added via add_glyph_legend(), allowing users to easily interpret the range of daily temperature values based on a randomly selected weather station. Since the legend data is drawn from a single, randomly chosen station, it is important to set a seed for reproducibility to ensure consistent results.

set.seed(28493)
aus_temp |>
   ggplot(aes(x_major = long, y_major = lat,
              x_minor = month, ymin_minor = tmin,
              ymax_minor = tmax)) +
  add_glyph_boxes() +
  add_ref_lines() +
  add_glyph_legend() +
  geom_glyph_ribbon() +
  theme_glyph() 
Temperature Ranges Across Australia with Glyph Legend. Additional code is needed for the base map and additional theme customization.

Figure 9: Temperature Ranges Across Australia with Glyph Legend. Additional code is needed for the base map and additional theme customization.

4 Application

Four examples are provided to showcase different aspects of the sugarglider package. The first two examples utilize the Train Service Passenger Counts data from DataVic, while the latter two are based on the Airline Flight Delay and Cancellation data from Kaggle. The first example demonstrates the creation of a ribbon glyph map with Leaflet to explore variations in train service passenger counts. The second example uses a segment glyph to analyze daily variations in train passenger numbers in Melbourne City LGA. The third example features an interactive segment glyph, developed with ggiraph, to analyze flight variations at the top 19 US airports with high cancellation rates. Finally, the fourth example uses ribbon glyphs to compare flight variations between the western and southern regions.

4.1 Train Service Passenger Counts

This dataset, provided by DataVic, is managed by the Victorian Department of Premier and Cabinet and is owned by the State Government of Victoria. The Train Service Passenger Counts dataset offers estimated numbers of passengers boarding and alighting from Metro Train and/or Regional Train services for the year 2023-2024. These estimates include all individuals aged 5 and over, excluding train drivers and station staff. For the purposes of this analysis, the data has been aggregated on an hourly basis and separated into maximum and minimum passenger counts across weekdays, weekends, and holidays (both public and school holidays).

station_name hour long lat services mode min_weekday max_weekday min_weekend max_weekend min_holiday max_holiday
Aircraft 05:00 144.7608 -37.86661 1 metro 2 4 0 2 0 3
Aircraft 06:00 144.7608 -37.86661 1 metro 14 19 1 5 0 14
Aircraft 07:00 144.7608 -37.86661 1 metro 19 24 2 4 2 18
Aircraft 08:00 144.7608 -37.86661 1 metro 11 16 2 6 1 13
Aircraft 09:00 144.7608 -37.86661 1 metro 6 10 4 8 1 7
Aircraft 10:00 144.7608 -37.86661 1 metro 6 11 5 10 2 9

Victoria train station patronage

Interactive graphics are particularly useful when working with spatio-temporal data as they allow users to explore the data from multiple perspectives. The cubble package, exemplifies this by creating linked interactive plots using crosswalk::bscols() (Zhang et al. (2024)). In this paper, we will demonstrate how to create interactive glyph maps using Leaflet.

Making glyph map using leaflet is useful for high density data where a lot of the observation are either clustered together or overlap with each other. With leaflet’s ability to pan in and out, we are able to display all the glyphs regardless of how close they are to each other. This can’t be achieve with static maps since we can only plot a certain number of glyph before it gets overcrowed.

To create interactive glyph maps with Leaflet, we first need to save each glyph as a image and then add these to the Leaflet basemap as icons. The process begins by creating a list of all unique train stations that service both metro and vline. We then iterate over each station and generate ribbon glyphs using geom_glyph_ribbon(), add_glyph_boxes() and add_ref_lines(). Each glyph is saved in PNG format, and the file paths for all the PNGs are stored in an object for the next step.

# Generate a list of unique train stations
df_station <- train$station_name |> unique()

# Generate PNG of all the ribbon glyph
purrr::map(1:length(df_station), function(i) {
  dt <- train |> filter(station_name == df_station[i])
  p <- dt |>
  ggplot(aes(x_major = long, y_major = lat,
                   x_minor = month_year, ymin_minor = min_monthly,
                   ymax_minor = max_monthly)) +
    add_glyph_boxes(color = "#FFAD60", 
                    fill = "#FFEEAD", alpha = 0.5,
                    linewidth = 1, width = 3, height  =2) +
    add_ref_lines(color = "#FFAD60", alpha = 1,
                  linewidth = 1, width = 3, height  =2) +
    geom_glyph_ribbon(color = "#A66E38", fill = "#A66E38",
                      width = 3, height  =2) +
    theme_void() 
  
  file_path <- paste0("figures/glyph_", df_station[i], ".png")
  ggsave(file_path, plot = p, width = 3, height = 2, units = "in", dpi = 300,
         bg = "transparent")
  return(file_path)

  }) -> train_png 

To create the base map for Leaflet, we use the leaflet() function and addProviderTiles() with CartoDB.Positron as the provider to achieve a light, grey map aesthetic. Additionally, a scale bar is added at the bottom left corner for reference.

# Create a leaflet map 
leaflet_map <- leaflet() |>
  addProviderTiles("CartoDB.Positron") |>
  addScaleBar(position = "bottomleft")

Next, we iterate through all the PNG files and convert them into icons using the makeIcon() function. Users can control the dimensions of each icon by modifying the iconWidth and iconHeight arguments. The final step is to add each icon to the Leaflet map using addMarkers(). Within the options argument of addMarkers, users can fine-tune each glyph’s display properties, such as the opacity level. The label argument allows users to specify the information they wish to display with the hover-over effect, which in this example is set to display the station ID.

# Loop through the PNG files and add them to the map
for (i in seq_along(train_png)) {
  icon <- makeIcon(iconUrl = train_png[i], iconWidth = 100, iconHeight = 60)

  dt <- train |> filter(station_name == df_station[i])
  leaflet_map <- leaflet_map |>
    addMarkers(lng = dt$long[1], lat = dt$lat[1], icon = icon,
               label = dt$station_name, options = markerOptions(opacity = 0.1))
}

leaflet_map
Screenshot of the hourly train station traffic in Melbourne. Each glyph represent hourly traffic, with peaks occurring during typical commuting hours (morning and evening rush hours)

Figure 10: Screenshot of the hourly train station traffic in Melbourne. Each glyph represent hourly traffic, with peaks occurring during typical commuting hours (morning and evening rush hours)

The plot presents a visualization of hourly train traffic across various stations in Victoria. Each station is represented by a small glyph or graph, likely showing the fluctuations in train traffic at that particular location throughout the day.

The central Melbourne area, especially near major hubs like Flinders Street Station, shows the highest peaks in train traffic, as indicated by the larger glyphs. This reflects the role of the central business district (CBD) as the primary transit hub for the city. In addition, stations in the CBD appear to have more pronounced peaks, suggesting a higher number of services during peak hours compared to suburban stations, which might have flatter curves, reflecting more consistent traffic throughout the day.

As we move outward from the city center, the glyphs show smaller peaks and more consistent, lower levels of traffic. This suggests that train services are less frequent or less utilized in the outer suburbs compared to the central areas.

Some stations in the eastern and southeastern parts of Melbourne also show moderate peaks, indicating important transit corridors for residents commuting into the city center. These stations are along major rail lines, such as the Cranbourne/Pakenham and Lilydale/Belgrave lines, which serve large suburban populations.

Daily Variation in Train Passenger in Melbourne City

weekday <- station |>
  group_by(station_name) |>
  ggplot(aes(x_major = long, y_major = lat,
             x_minor = hour, y_minor = min_weekday,
             yend_minor = max_weekday,
             color = services, fill = services)) +
  add_glyph_boxes() +
  add_ref_lines() +
  geom_glyph_segment(global_rescale = FALSE) +
  theme_glyph() +
  labs(title = "Weekday Patronage") +
  theme_glyph() 


weekend <- station |>
  group_by(station_name) |>
  ggplot(aes(x_major = long, y_major = lat,
             x_minor = hour, y_minor = min_weekend,
             yend_minor = max_weekend,
             fill = services, color = services)) +
  add_glyph_boxes() +
  add_ref_lines() +
  geom_glyph_segment(global_rescale = FALSE) +
  theme_glyph() +
  labs(title = "Weekday Patronage") +
  theme_glyph() 
Patronage at various train stations in Melbourne City LGA on weekdays versus weekends. Each glyph is color-coded by the number of train services at each station. Additional code is required to construct the base map and customize the plot further.

Figure 11: Patronage at various train stations in Melbourne City LGA on weekdays versus weekends. Each glyph is color-coded by the number of train services at each station. Additional code is required to construct the base map and customize the plot further.

The plot provides valuable insights into train service traffic across Melbourne’s City LGA, with a clear distinction between weekday and weekend patronage. On weekdays, a higher volume of train services is observed, particularly in central areas where stations handle the most traffic. This is reflected by the darker shades of color, indicating up to 20 services at certain stations, especially in the southeastern part of the city. The weekend patronage, by contrast, exhibits a lighter overall color gradient, signaling a reduction in services. While some central stations still maintain moderate traffic levels, most stations experience a noticeable decrease in the number of trains.

A key observation is the difference in traffic between core and peripheral stations. Central stations serve as major transit hubs, handling significantly more services on both weekdays and weekends, while stations on the periphery see much lighter traffic. This suggests that the city’s transit infrastructure is designed to cater to higher demand in central areas, with peripheral stations serving fewer passengers. Additionally, spatial differences are apparent, with certain northern stations consistently experiencing lower traffic regardless of the day, possibly due to differences in population density, land use, or connectivity.

Overall, the plot illustrates a typical pattern of public transport use, with higher demand and service frequency during weekdays, especially in the central parts of the city, and a reduction in services over the weekend. This reflects the reliance on public transport for weekday commuting and the lower travel demand on weekends, which could inform future decisions on scheduling and resource allocation in the transit system.

4.2 Flight Delay and Cancellation

The U.S. Department of Transportation (DOT) Bureau of Transportation Statistics monitors the on-time performance of domestic flights operated by major U.S. airlines. Each month, the DOT publishes the Air Travel Consumer Report, summarizing on-time, delayed, canceled, and diverted flights.

This dataset, obtained from the Kaggle Airline Flight Delay and Cancellation data, has been processed and aggregated to display the minimum and maximum number of flights originating from the top 10 U.S. airports with the highest cancellation rates.

origin month long lat min_flights max_flights
ATL 1 -84.42806 33.63667 2321 3279
ATL 2 -84.42806 33.63667 1953 3074
ATL 3 -84.42806 33.63667 2667 3476
ATL 4 -84.42806 33.63667 896 3361
ATL 5 -84.42806 33.63667 785 3441
ATL 6 -84.42806 33.63667 1119 3473

Monthly Flight Variation in US Airport with High Cancellation

This example demonstrates the functionality of geom_glyph_segment(), which shows the monthly range of flights for each airport and provides insights into how flight numbers change over time. Additionally, geom_glyph_ribbon() is used to visualize the variation between the minimum and maximum number of flights from each airport, offering a clear depiction of the spread in flight activity.

User can generate interactive glyphs with ggiraph::girafe using sugarglider. First, users need to specify tooltips, which are the details displayed when hovering over the glyphs. In this example, the tooltips consist of station id, month, minimum and maximum temperature across all month.

Tooltips needs to be provided in the aesthetic to in the tooltip argument. User can then plot their desired glyph map and save it as a ggplot object. This ggplot object is then converted into girafe object using the girafe() function.

# Specify tooltip for ggiraph 
flights <- flights |>
  mutate(tooltip = paste("origin: ",origin,
                         "\nmonth: ", month,
                         "\nmin_flights: ", min_flights,
                         "\nmax_flights: ", max_flights))

fl <- flights |> 
  ggplot(aes(x_major = long, y_major = lat,
             x_minor = month, y_minor = min_flights,
             yend_minor = max_flights,
             tooltip = tooltip)) + 
  add_glyph_boxes() +
  add_ref_lines() +
  geom_glyph_segment() +
  theme_glyph()

# Interactive plot using ggiraph
girafe(ggobj = fl)
Monthly Flight Variability Based on the top 10 US airports with high cancellation rates. Additional codes are needed for the base map and additional theme customization.

Figure 12: Monthly Flight Variability Based on the top 10 US airports with high cancellation rates. Additional codes are needed for the base map and additional theme customization.

This visualization shows the range of departing flights from each airport on a map. Each line represents the flight range for a specific airport, with the length of the line showing the variability in flight numbers across different locations. Notably, airports like ATL and ORD generally handle a higher volume of departing flights compared to airports such as MCO and PHX. All airports exhibit fluctuations throughout the year, with wider flight intervals in the middle of the year and narrower intervals during the holiday season at year-end.

The graph highlights that airports in certain regions experience more variability than others. While the line plots provide a clear estimate of this variability, the use of geom_glyph_ribbon() enhances understanding by visually representing the gap between the minimum and maximum flights. Wider ribbons indicate greater month-to-month fluctuations in flight operations.

Flight Pattern Between Western and Eastern Region

# South Region
flights |> 
  filter(origin %in% c("ATL", "CLT", "MCO", "DFW")) |>
  ggplot(aes(x_major = long, y_major = lat,
             x_minor = month, ymin_minor = min_flights,
             ymax_minor = max_flights)) + 
  add_glyph_boxes() +
  add_ref_lines() +
  geom_glyph_ribbon() +
  theme_glyph()

# West region
flights |> 
  filter(origin %in% c("PHX", "LAS", "LAX", "SEA")) |>
  ggplot(aes(x_major = long, y_major = lat,
             x_minor = month, ymin_minor = min_flights,
             ymax_minor = max_flights)) + 
  add_glyph_boxes() +
  add_ref_lines() +
  geom_glyph_ribbon() +
  theme_glyph()
Comparison of flight patterns between the western and southern regions. Additional codes are needed for the base map and additional theme customization.

Figure 13: Comparison of flight patterns between the western and southern regions. Additional codes are needed for the base map and additional theme customization.

geom_glyph_ribbon() effectively highlights the disparities in flight volume between regions. The Western region exhibits a greater variation in the number of flights compared to the Southern region, as evidenced by the thicker ribbons. Notably, both regions display a significant increase in flight volume discrepancies during the mid-year, reinforcing the findings observed with geom_glyph_segment().

5 Discussion

This paper introduces the R package sugarglider, designed to create glyph maps for visualizing spatio-temporal data using customizable ribbon and segment geometries. The package provides two main functions: geom_glyph_ribbon() and geom_glyph_segment(), which generate ribbon and segment glyphs respectively. One of the key features of sugarglider is its flexibility in allowing users to customize their visualizations. For example, the global_rescale argument enables users to control whether the rescaling of glyphs should be applied globally across all data points or handled individually for each glyph. Additionally, users can adjust the width and height of the glyphs, ensuring that the appearance of each glyph can be tailored to the specific dimensions and scaling of the data being visualized. The package supports both static and interactive graphics, and several examples are included to demonstrate how sugarglider can be effectively used for spatio-temporal data visualization.

Two potential directions for future development include handling different data types and improving the handling of glyph overlap. Currently, sugarglider expects data in a tidy format, but expanding its functionality to work with lists of vectors or matrices would enhance its usability. Additionally, static maps can become overcrowded when too many glyphs are plotted, which can make it difficult to interpret the visualizations. A possible solution is to collapse overlapping glyphs into a single glyph, with an accompanying data summary for all the collapsed glyphs. These improvements would increase the flexibility and scalability of sugarglider for a wider range of data visualization tasks.

6 Acknowledgements

W. S. Cleveland and R. McGill. Graphical perception: Theory, experimentation, and application to the development of graphical methods. Journal of the American Statistical Association, 79(387): 531–554, 1984. URL http://www.jstor.org/stable/2288400 [online; last accessed October 3, 2024].
G. Grothendieck, L. Kates and T. Petzoldt. Proto: Prototype object-based programming. 2016. URL https://CRAN.R-project.org/package=proto. R package version 1.0.0.
S. Kim, S. Jeong, I. Woo, Y. Jang, R. Maciejewski and D. S. Ebert. Data flow analysis and visualization for spatiotemporal statistical data without trajectory information. IEEE transactions on visualization and computer graphics, 24(3): 1287–1300, 2017.
B. Schloerke, D. Cook, J. Larmarange, F. Briatte, M. Marbach, E. Thoen, A. Elberg and J. Crowley. GGally: Extension to ’ggplot2’. 2024. URL https://CRAN.R-project.org/package=GGally. R package version 2.2.1.
R. Spence. Information visualization. 2001.
P. Waddell. Introduction to urban simulation: Design and development of operational models. Transport Geography and Spatial Systems/Pergammon Press, 2004.
H. Wickham. ggplot2: Elegant graphics for data analysis. Springer-Verlag New York, 2016. URL https://ggplot2.tidyverse.org.
H. Wickham, H. Hofmann, C. Wickham and D. Cook. Glyph-maps for visually exploring temporal patterns in climate data and models. Environmetrics, 23(5): 382–393, 2012.
H. S. Zhang, D. Cook, U. Laa, N. Langrené and P. Menéndez. Cubble: An r package for organizing and wrangling multivariate spatio-temporal data. Journal of Statistical Software, 110(7): 1–27, 2024. URL https://www.jstatsoft.org/index.php/jss/article/view/v110i07.
C. Zhong, T. Wang, W. Zeng and S. Müller Arisona. Spatiotemporal visualisation: A survey and outlook. Digital Urban Modeling and Simulation, 299–317, 2012.

References

Reuse

Text and figures are licensed under Creative Commons Attribution CC BY 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".

Citation

For attribution, please cite this work as

Po, et al., "sugarglider: Create glyphmaps of spatio-temporal data", The R Journal, 2024

BibTeX citation

@article{sugarglider,
  author = {Po, Maliny and Yang, Nathan Shuyuan and Zhang, H. Sherry and Cook, Dianne},
  title = {sugarglider: Create glyphmaps of spatio-temporal data},
  journal = {The R Journal},
  year = {2024},
  issn = {2073-4859},
  pages = {1}
}